
MULTIMEDIA DATA TRANSMISSION SYSTEM 
DESCRIPTION 

Technical field 

This invention relates to a multimedia data 
transmission system. 

State of prior art 

5 Conventional multimedia servers are designed to be 

accommodated on a single platform. Usually, they 
consist simply of an application that runs on a 
computer equipped with interface cards to the telephone 
network . 

10 In its most widely distributed form, a host server 

is capable of finding data on external data servers 
accessible through the same LAN, using RPC (Remote 
Procedure Call) or ODBC (Open DataBase Connectivity) 
type protocols . 

15 This type of structure is suitable for the 

accommodation of simple multimedia servers in which 
there is no dynamic information. A company that would 
like to have a server accommodated describes the 
required service logic (if the user types #1, "you 

20 typed 1".... should be displayed) statically, and this 
logic runs on the service supplier accommodation 
platform independently. 

On the other hand, it becomes impossible to 
accommodate an application that requests information 

25 that necessitates close integration with one of the 
company's vital databases (booking statements, etc.), 
and the company must equip itself with its own 
infrastructure. 

More and more companies would like to integrate 

3 0 this type of multimedia service more closely with 
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internal data in their industrial process. The 
objective is to inform the customer in real time if the 
ticket that he has just purchased is available, the 
value of his share portfolio, etc. These are dynamic 
5 data that are only available within the company. 

Conventional multimedia accommodation services are 
not capable of satisfying these requirements, such that 
requesting companies are obliged to install their own 
server with the associated investments (private 

10 telephone exchange, telephone lines, etc.). 

In order to overcome the disadvantages of this type 
of server, the invention proposes a multimedia data 
transmission system, the purpose of which is to provide 
a dynamic multimedia service for companies who would 

15 like it, without obliging the company to purchase any 
hardware and while making a server accessible to the 
company using several technologies (particularly from 
the telephone network and from the Internet network) , 
with fully transparent service logic. 

20 

Description of the invention 

The syste^ according to the invention relates to a 
multimedia dlata transmission system characterized in 
that it coi/prises a^J^AlsL, in which the confidentiality 

2 5 and secur/ty are not controlled from end to end, onto 

which a smared voice and/or video resources host server 
designed to provide a dynamic service to at least one 
user. And at least one call control server located at 
each iservice supplier are connected. 

3 0 Advantageously, the _host_i.aerver connected to the 

network through an interfac'fe - is composed of five 
subsystems : 

• A protocol stack subsystem with an interface that: 
- receives calls from the data network at the 
3 5 exchange ; 
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- detects incoming calls and captures caller and 
called party numbers; 

- detects dial tones; 

- generates arbitrary codin g-decodi ng_ media data 



streams ; 
- receives 



arbitrary media coding-decoding data 
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str/gams. 

• A command interpreter subsystem capable of: 

- generating messages on detection of new calls to a 
cal/l control server placed at a customer; 

- generating event messages; 

- making us^ of commands originating from call 
controly^ervers placed at customers, such as: 

* order to play a pre-recorded audio or video file, 

* order to synthesize a voice message starting from 
a text , 

* order to start waiting for a dial tone, 

* order to disconnect the call, 

* order for voice recognition or other application. 

• A high performance transcoding resource subsystem. 

• A voice synthesis and/or video resource subsystem. 

• An audio or video sequences recording/reproduction 

module subsystem. y^Wc-e ^ ^ 

Advantageously/ each call control server located at 
a customer is software that receives events signaled by 
the host server and sends commands in reaction to these 
events. This software can run on a computer equipped 
with two netwo/k interfaces, one connected to the WAN 
to communicate^ with the host server, and the other 
connected to/ a company private network in order to 
dialog with/ databases and other industrial processes 
belonging t/o the customer. 

Thus, a new generation "accommodation" service can 
be provided in which all expensive resources (voice 
synthesis cards, etc.) are shared, while the customer 
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maintains control over the application and can 
interface it with whatever resources he wishes. 



Brief description of the drawings 

- Figure 1 illustrates a first embodiment of the 
invention; 

- Figure / 2 illustrates the dialog between an 
operajfor ^ s erver with voice recognition and the 
server belonging to a company A; 

- Figure 3 illustrates an example of a voice 
recognition procedure; 

- Figure 4 illustrates an embodiment of a 
^^peci^a]J,.zed_pa^^ reacts to voice. 

Detailed presentation of an embodiment 

The y6.nvention relates to a multimedia data 
transmission system that comprises a WAN, which may or 
may not / be public, on which the confidentiality and 
security are not controlled from end to end, and onto 
which a shared voice and/or video resources host server 
is comiected and provides a dynamic service to at least 
one dustomer, and onto which at least one call control 
server located at each customer is also connected. 

The invention consists of placing a^^oice^resoi^^ 
in the WAN (capable of reproducing audio files, 
recording them, performing synthesis or voice 
recognition, detecting^^TMF^ (Dual Tone^^Mult^ 
tones from two sounds, equipped with a protected 
protocol that can remote control it from a wide area 
network (such as the Internet network) . 

The application that controls this voice resource 
may be located anywhere on the network. Thus, the 
(:servep^ is a distributed platform in which expensive 
resources are located in the network, and in which the 
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is located at the 

^ 

customer . 

Therefc/be, the invention can be used to share the 
voice resource server located in the network of an 
operator /between several customers that execute the 
service logic in their premises. The companies simply 
need to/ have a connection with the data network. The 
operator server is accessible either from multimedia 
stations connected to the data network, or from any 
telQ!^hone through a gateway. 

With tlie invention, the supplier of the 
"accommodation" service provides a c all contro l 
software jLo his customers, who run it locally on a 
machine /n their network, and interface it with their 
critica/ 1 databases ^ 

When a callL arrives for this customer, it reaches 
the shared vcfice resource platform. This platform 
analyzes the /requested number or the "ALIAS" for IP 
(INTERNET PROTOCOL) calls and deduces the client 
concerned, /it sends a new call notification through 
the WAN to& the call control application for the 
customer c/ncerned. In particular, this application 
may ask thjfe following in return: 

play a prerecorded audio file; 
synthesize a text; 
record a text; 

ask for a video sequence to be sent if the 
connected person has an appropriate terminal; 
make a voice recognition. 

The voice resource can be made above the H.323 
protocol so that users can be connected through the 
switched telephone network (through an STN/IP gateway) , 
or through the Internet network, indifferently. 



35 



SP 15889 C/DB 



• 



6 



7 

In one advantageous embodiment, the host server is 
connected to the WAN through an Ethernet or other 
interface, and is composed of five subsystems: 

• A first subsystem, which is an H.323 protocol stack, 
for which the API (Application Programming Interface) 
is capable of: 

- detecting incoming calls and capturing the caller 
and called party numbers (or H.323 ALIAS); 

- detecting DTMF tones (transported in the H.245 
protocol ) ; 

- generatij{g media data streams (sound + video) 
with aijfoitrary coding-decoding; 

- receivino/^fnedia data streams (sound + video) with 
arbitr^^^ coding-decoding; 

• Possibly a second subsystem, which is a high 
performance transcoding resource, typically a digital 
signal processor card capable of transcoding the 
G.711 / G. 723.1 protocols. 

• Possibly a third subsystem which is a voice synthesis 
resource generating G.711 or G. 723.1 type data 
streams, possibly with "streaming" capacities 
(division of a large file into successive small 
elements with limited duration) . 

• Possibly a fourth subsystem, which is an audio and 
video sequence recording / reproduction module with 
"streaming" functions during reproduction. 

The action of these subsystems is coordinated by a 
fifth subsystem which is essentially a command 
interpreter capable of: 

- generating new caZl detection messages to a call 
control server g>laced at a customer; it must 
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also choose^ the right call control server 
starting ^^JProm the called number; 

- generating event messages, for example 
corresponding to DTMF tones; 

- impleiry^nting commands from call control servers 
placed at customers, such as: 

* order to play a prerecorded audio or video 
file, 

* order to synthesize a voice message from a 
text , 

* order to go in waiting for a DTMF dial tone, 

* order to disconnect the call, 

* order for voice recognition or other 
application . 

Calls from the switched telephone network are 
translated by an STN n etwork/H .232 gatewa y^ for 
processing by the host server. The gateway function 
may possibly be integrated in the host server. 

Other subsystems (voice recognition, fax 
generation/reception, etc.) may be added to increase 
the functional richness of the complete assembly. 
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In one [advantageous embodiment, the call control 
server locaAied at the customer is a simple software 
(for exampl/e "Window NT" service) that receives events 
signaled by the host server and sends commands in 
reaction to these events. This software may run on a 
computer / provided with two network interfaces, one 
connected to the Internet network to communicate with 
the host/ server, and the other connected to a company 
private/ network to dialog with databases and other 
industrial processes within the company. 
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The host computer is configured so as to not 
transmit IP packets from the Internet network to the 
internal network. 

The customer can configure the service logic itself 
5 using a script language (for example Java Script, 
VisualBasic) , or a graphic interface. 

The dialog protocol may be any secure dialog 
protocol with short waiting times. In one embodiment, 
10 a protocol is used on a standard UDP in which each 
information block sent is in the following form: 

<block><random><64 random bits></random><cipherblock> 

encrypted data</cipherblock> </block> 

15 

The encrypted information block must have the following 
structure once it has been decrypted: 

<clearinf o> 
20 <serial>serial nuinber</serial> 

<other inf ormation> ... <other information> 
</clearinf o> 

Information encrypted in the " cipherblock" block is 
25 obtained by encrypting the "clearinfo" structure using 
the DES (Data Encryption Standard) standard in CBC 
(Cipher Block Chaining) mode, using the 64 random bits 
for the initial exclusive OR. The sender's identity is 
proven by the possibility of finding an intelligible 
3 0 message with decryption. The receiver must memorize 
the last serial number received from the sender and 
discard any message received with a serial number less 
than or equal to the current serial number. 

The sender can protect his transmission (UDP 
35 standard) by sending several identical messages. The 



SP 15889 C/DB 



receiver memorizes the serial number of the first 
correctly received message and discards subsequent 
messages without examining them. 





1 



20 





> 35 



Figure 1 illustrates a first example use, which is 
for the communication by an IP interactive voice 
server. 

A network 10, for example Internet, in which 

the vyice and/or video resource operator server 11 is 
cong^fected to: 

- an ordinary telephone 12 through a WAN telephone 
gateway 13 ; 

- a multimedia station 14 through a two-directional 
link 15, of the H.323, SIP, or other type of 
voice data stream; 

- thiree servers 16, 17 and 18 for companies A, B 
a^d C. 

When the /operator server 11 receives a new 
communication /from a user, the first thing it does is 
to analyze the called 'number and then deduces which 
company server should manage the communication; for 
example serj^er 16 for company A. 

Company A Aakes fast part orders. Server 16 sends 
its welcome announcement stored in the welcome file in 
the operator /server 11: "welcome to company A's fast 
order served, please press on the key to begin". 

Informed iV^ers can interrupt this announcement by 
pressing ofli the key. 

As soon as /the user presses on ' * ' , the operator 
server 11 info/rms company A's server 16 with a "DTMF 
event" messag)^. Company A's server 16 then begins — to^ 



play the "Da_:you_want_to_order " file which contains a 
recording 9<t this phrase. 

Company A's server/l6 decides to use the voice 
command, to order the operator server 11 to start 
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recognition Ion the "yes, no" vocabulary. As soon as 
the user sJys "yes", the server 16 is informed by a 
"Word_recocjnition" message . 

Server 1^ then asks how many parts the customer 
wants to e^der and records this number by voice 
recogniti^. It then stops the voice recognition 
procedui/e by a " Stop_recognition" command. 

Finally, the server 16 repeats the amount of the 
order to tftie customer asking the operator server 11 to 
synthesi:^ the "You have ordered three parts" character 
^sirrTngT 'The user then harrgs— tip . 

The dialc/g between the operator server 11 with 
voice recognition which receives an H.323, SIP or other 
voice date^ stream and company A's server 16, is 
illustr^ed in figure 2. 

Wice recognition procedures usually comprise two 

- the first part (A) uses the voice data stream (64 
kbits for standard G.711 and 6.4 kbits for 
standard G. 723.1) and extracts significant 
components from it (spectrum, etc.), the result 
is a low rate data stream between 4 and 8 
kbits /s; 

- the second part (B) attempts to recognize words 
in a vocabulary starting from components 
transmitted by the first part A. 

The scheme illustrated in figure 3 shows how the 
different modules of a voice recognition procedure 
communicate with each other. 

There are two ways of creating a voice recognition 
procedure in the IP interactive server: " 

• When the customeif who is calling the company 
server is not controlled by the network operator, 
the A and B czfomponents have to be put on the 



SP 15889 C/DB 




SP 15889 C/DB 




12 



The cTOinpany server then initiates actions as a 
function/of the recognized words. For example, it can 
send a/ command message to the ActiveX component to 
displa^y another specialized page. 

The following protocol is used: 

1. Cormi4ction request: Connection request message {operator 
server => company server) 

(Implicit in TCP/IP by opening the exchange mechanism in 
TCE/yiP) 

2. Call ^ta: Transmit call data (operator server => company 
serve 

CaL3^ed number 
Lling number 

3. Reld sound: Read a sound file (company server => operator 
s/6rver) 

Logical channel number 

Name of the element to which the response is to be notified 
Time in ms before playing the sound 
File name to be played 

Digit used to detect the end of the sound file 
Format of the sound file (Wav, Vox, ADPCM .„) 
Data format 
Sampling frequency 

4. DTME^event message (operator server => company server) 
Logacal channel number 

DTMF key code 

5. Sounca recording: Recording of a message (company server => 
operator server) 

Channel number 

Name of the element to which the response is to be notified 

Time before beginning the recording 

Name of the message save file 

End of recording character 

Maximum recording time 

Maximum silence time 

Save file format 

Data format 

Sampling frequency 

Send a beep to signal when the recording starts 

6. Sei^ tone: Send a tone (company server => operator server) 
CJnannel 

Name of the element to which the response is to be notified 
TimeBefore 
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Dial Tone 
Frequency 1 
Frequency 2 
Amplitude 1 
Amplitude 2 
Tone duration 

7 . Rea6 chain : Concatenate a string of characters ( company 
server => operator server) 
Logical channel number 
Name of the element to which the response is to be notified 
Time before reading sound 

Character string, for which the data => sound conversion is 
to be made 
15 End of file character string 

Sound file format (Wav, Vox, ADPCM ...) 
Data format 

Sampling frequency format 

Mix size, so that two files can be mixed later (Smooth 

2 0 transition) 

Breakdown type, which will be used later for number 
generation time functions starting from a sound library 
Character used to separate expressions in the character 
string 

25 File name resulting from the concatenation 

Word field name 
Sound field name 
Dictionary access path 

3 0 8. Dis/onnect user: The caller hung up (operator server => 

CoE(^any server) 

Logical channel number to be disconnected 

(Implicit in TCP/IP by closing the TCP/IP exchange mechanism) 

9. - Disconnect server: Disconnection request by the company 
servOT software (company server => operator server) 
Log^al channel number to be disconnected 

10. Voice synthesis: 

40 Logical channel number 

Name of the element to which the response is to be notified 

Text to be converted in voice synthesis 

Choose a specific voice, if required 

Speaking speed 
45 Speaking frequency . 

11. Extended call (function of the call transfer request) 
Logical channel number 

50 Name of the element to which the response is to be notified 

Transfer request time 

Number to which the call is to be transferred 
Call type 
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Number of rings before abandon 

Time to analyze the result of the transfer request 

12. Start recognition (function requesting beginning of voice 
5 recognition) 

Logical channel number 

Name of the element to which the response is to be notified 
Name of the words file to be analyzed 
Digit used to detect the end of the sound file 
10 Maximum recording time 

Maximum silence time 

Send a "beep" signaling the beginning of the recording 

13. Stop recognition (function requesting the beginning of voice 
15 recognition) 

Logical channel number 

14. Word recognition (function requesting the beginning of voice 
recognition) 

2 0 Logical channel number 
Name of the element to which the response is to be notified 
List of recognized words 

We will now describe several other example 
25 embodiments . 

• Call from the telephone network 

A person who would like to book a journey calls 
083 6011234. This number actually connects to an 

3 0 STN/H.323 network gateway that converts the call into 
IP data and sends it to the host voice resources 
server . 

^ The Ivoice resources server analyzes the requested 

S^^i^Nv number and deduces that the call must be controlled by 

^^^^y^ 3 5 the call control server located at the IP address 
192.12/13.14 (located in the travel agent). 

Therefore, it abends a new call message to the 
travel agent's cal^ control server. This call control 
server asks it t^o play a musical background quickly 
presenting the yCompany and asking the caller to press 
"1" to book a yoyage, or "2" to leave a message. 
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The/ person presses "1" and the host voice resources 
server /retransmits the event to the travel agent's call 
control server , 

The dialog continues. It could be imagined that 
the travey agent would like to announce the price of a 
particular voyage. The call control server looks in 
the t/avel agent ' s database for prices and 
availabilities, and asks the host voice resources 
serv^ to play the recorded string "the price of your 
voyage is", and then to synthesize "2345" and then play 
"Francs" . 
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• Call from the Internet network 

An /H. 323 terminal clicks on a link starting from a 
travel/ agent ' s Internet site, provoking a call from the 
H. 323 / terminal to the H.323 host server. The server 
analwes the called number and sends an indication for 
the call to the travel agent * s call control server. 

The travel agent's call control server does not 
need to be modified, and can execute the same scenario 
as in the previous case. 

But it can also choose to offer more services, 
since a protocol element informs it at the time of the 
indication of the new call that the call is incoming 
from the Internet network, it can suggest that a 
specific page should be viewed, or even give the order 
to the host server to play a video sequence describing 
a particular voyage. 

The call is free for the Internet network user. 

• Call from another country 
If the operator y^ias installed another host voice 

resources server another country, the travel agent 

may be access ibl/e from this country. The operator 
simply reserves/a number that is forwarded to the local 
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voiqe resources server. The server continues to 
contact the company's call control server. The source 
of tne call is indicated when a new call indication is 
received, so that the call control server can 
dynamically adapt to the most suitable language when it 
is helpful to do so. 

This solution is much less expensive than a 
conventional solution, since no international voice 
communication is necessary. 
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